3574 results found.
Written
Corpus,
Language Type:
Monolingual
Languages:
Afrikaans Albanian Arabic Armenian Bangla Basque Bosnian Breton Bulgarian Catalan Croatian Czech Danish Dutch English Esperanto Estonian Filipino Finnish French Galician Georgian German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Kazakh Korean Latvian Lithuanian Macedonian Malay Malayalam Norwegian Persian Polish Portuguese Romanian Russian Serbian Sinhala Slovak Slovenian Spanish Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese pt_br ze_en ze_zh zh_cn zh_tw
Availability:
Freely Available
License:
<Not Specified>
Size:
22.10G tokens Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yo Joong Choe | OpenSubtitles2018 | /N |
Documentation:
Yes, on the website.
Written
Lexicon,
Language Type:
Monolingual
Languages:
Afrikaans Albanian Arabic Armenian Bangla Basque Bosnian Breton Bulgarian Catalan Croatian Czech Danish Dutch English Esperanto Estonian Filipino Finnish French Galician Georgian German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Kazakh Korean Latvian Lithuanian Macedonian Malay Malayalam Norwegian Persian Polish Portuguese Romanian Russian Serbian Sinhala Slovak Slovenian Spanish Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese pt_br ze_en ze_zh zh_cn zh_tw
Availability:
Freely Available
License:
CreativeCommons Attribution 4.0 International
Size:
41 GByte Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yo Joong Choe | word2word | /N |
Documentation:
Yes, on the website.
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Dolgan English Russian
Availability:
Freely Available
License:
https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
Size:
None Production Status:
Newly created-finished
Use:
Corpus Creation/Annotation
-
Paper title:Processing Language Resources of Under-Resourced and Endangered Languages for the Generation of Augmentative Alternative Communication Boards
-
Paper track:Speech/oral presentation
-
Paper status:Accept Poster+DemoSuggested
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Anne Ferger | INEL Dolgan Corpus 1.0 | /N |
Documentation:
https://corpora.uni-hamburg.de/hzsk/de/islandora/object/file:dolgan-1.0_INEL_Dolgan_Corpus_1.0_User_Documentation/datastream/PDF/INEL_Dolgan_Corpus.pdf
Written
Evaluation Data,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
CreativeCommons
Size:
1M Production Status:
Newly created-finished
Use:
Evaluation/Validation
-
Paper title:Aligning Wikipedia with WordNet:a Review and Evaluation of Different Techniques
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Antoni Oliver | pwnalign | /N |
Documentation:
None
Written
Corpus,
Language Type:
Bilingual
Languages:
English German
Availability:
Freely Available
License:
Size:
None Production Status:
Existing-used
Use:
Document Classification, Text categorisation
-
Paper title:Lexicogrammatic translationese across two targets and competence levels
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Ekaterina Lapshinova-Koltunski | VARTRA | /N |
Documentation:
None
Written
Corpus,
Language Type:
Bilingual
Languages:
English Russian
Availability:
Freely Available
License:
CreativeCommons
Size:
2.3 million tokens Production Status:
Existing-updated
Use:
Document Classification, Text categorisation
-
Paper title:Lexicogrammatic translationese across two targets and competence levels
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Ekaterina Lapshinova-Koltunski | RusLTC | /N |
Documentation:
https://www.rus-ltc.org/static/html/about.html (in English and Russian)
Written
Corpus,
Language Type:
Multilingual
Languages:
Chinese English Japanese Others
Availability:
Freely Available
License:
Size:
353,055 entries Production Status:
Newly created-finished
Use:
Spelling Correction, Grammatical Error Correction
-
Paper title:GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Masato Hagiwara | GitHub Typo Corpus | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Size:
347 dialogs OtherProduction Status:
Existing-used
Use:
Dialogue
-
Paper title:Mapping the Dialog Act Annotations of the LEGO Corpus into ISO 24617-2 Communicative Functions
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Eugénio Ribeiro | Parameterized & Annotated CMU Let's Go Database (LEGO) | /N |
Documentation:
None
Written
Corpus,
Language Type:
Bilingual
Languages:
Dutch English
Availability:
Freely Available
License:
Size:
26 Dialogs OtherProduction Status:
Existing-used
Use:
Dialogue
-
Paper title:Mapping the Dialog Act Annotations of the LEGO Corpus into ISO 24617-2 Communicative Functions
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Eugénio Ribeiro | The DialogBank | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Size:
347 Dialogs OtherProduction Status:
Newly created-in progress
Use:
Dialogue
-
Paper title:Mapping the Dialog Act Annotations of the LEGO Corpus into ISO 24617-2 Communicative Functions
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Eugénio Ribeiro | LEGO-ISO | /N |
Documentation:
None




